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ABSTRACT 


A file management system has been implemented on top of 
o p e r a t :i. ri & s w s t e m C P / M :l. ♦ 4 f o r m :I. c r o e o m p u i e r s ♦ 1 1 i s a p a r t 

of a a row project titled " Implementation of RDB/M •••• a 
relational data base on microcomputers "♦ Two type a of file 
structure© namely (.1) Hashed.* and ( i x ) ISAM have been 
provided* They have been designed particularly for floppy 
disk as the stores© medium* However y they can be adapted to 
other media bw simply oh an dins the module that decides on 
physical placement of blocks* A set of routines have been 
written to access both the file structure bw means of a 
uniform interface* An efficient buffer management scheme 
has been adopted to minimise secondary stores© accesses* 
Performance analysis of both the file organizations for the 
particular implemented structure has been carried out* 
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CHAPTER 1 


INTRODUCTION 


14 MOTIVATION 

An efficient, implementation of a data base management 
system is dependent on the under 1 wind secondary storage 
structure* Since the data bases cannot Generally be 
accomodated in the main memory y time and a da in the secondary 
storage devices will have to be accessed to make retrievals 
and updates to the data base* 

A system that minimises the secondary accesses would 
have better performance* 

For a microcomputer system the problem is not only with 
slow secondary storage devices but also very limited main 
memory and secondary storage space* The commonly used 
secondary stored© device for a microcomputer system is a 
floppy diskette which has a limited stored© capacity and 
access time is also very hidh* 

In answering a emery in a relational data base* a lard© 
number of secondary accesses are to be made because of the 
power of the emery landuades beind as powerful as relational 


a 1 defora* 



Keeping in viewy the 


problems with a microcomputer 


system the following objectives have been set in designing a 
s t o r a g e s t r u c t u r e * 

< i ) Minimize number of disk accesses 

<;i. i) use as less memory space as possible 

( i i i ) p r o v i d e a n e f f i e i © n t a c o e s s m e t h o d i n d e p e n d e n t o f 
s t o r a si e s t r u c t u r e 

This is an initial attempt to support a relational data 
base scheme on microcomputers* This has been taken up as a 
group project* In listing requisite features for a 
hypothetical relational data base KIMCKIM793 describes nine 
features. After giving careful thought to the features and 
needs of such a system on a microcomputer four features have 
been chosen namely » 

<1) to support an efficient file structure and efficient 
access path to the data base 

<ii) an interface for a high level non "-procedural language 
for euerwy data manipulation » data definition and data 
control 

(iii) integrity control 

(iv) recovery from both soft and hard crashes. 


This thesis is aimed at providing the first feature 


in 


the above list. 


The prime motivation for taking up such an 


activity is 

<i) abundance of microcomputers and trend towards 
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d i s t r x b u t e d c o m put i n at 

(:i. i) an interstins exercise in implementind a data base to 
be aware of the problems and issues involved in an 
i in p 1 e in e n t a t :i. a n * 

1*2 RESULTS 

The following storage ordani nations have been 
implemented } 

( 1 ) Hashed 

(2) ISAM 

An access method interface independent 

underlying structure has been provided* 

organisations use a buffer ind scheme to minimise 
<•> t o r a d e a c c e s s e s ♦ 

1*3 JO AT A BASE MANAGEMENT SYSTEMS 

One of the wide spread uses of computers is 
s to rind*/ retrieving data from real life* A data base is a 
collection of such data stored in a computer* A data base 
management system (DBMS) is the software which provides an 
abstract view of the data and allows one or more users to 
modify the data usind operators in the abstract view* It is 
a d e n c r a 1 i © d t o o 1 f or m a n ;i. p u 1 a i i n d d a t a C F R S B 7 6 II ♦ 


of the 
Both the 
secondary 
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A DBMS :i. s designed with the following objectives » 

CO to provide an abstract integral view of data available 
to users* 

(2) to provide logical data independence and physical data 
independence 

(3) to allow centralized control of data base 
< 4 ) to provide a friendly user environment 

( Si > t o e n s i..i r e e u a 1 i t y a n d i n t e g r i t y o f d a t a 
<6) to ensure privacy and security 

(7) to provide ability to recover the system from failures 

The purpose of DBMS is to avoid the trouble of learning 
the details of data storage and its access mechanism by the 
user* The user deals with the logical model and does not 
have to worry about the machine representation of his data* 
1'n providing all these? At assures integrity? security and 
privacy of data* In general a DBMS is an efficient manager 
of storing? ret rieving? updating data in a data base in an 
:i. n t rear a t e d m a n ri e r * 

.1*4 RELATIONAL BATA MODEL 


Data about an entity is the values of attributes that 
describe it* Once data is captured the Question arises how 
to store it physically in the data base* The model that is 
adopted has mainly two elements i'.'ULLMA2n 

and 


< :i. ) a mathematical notation for expressing data 



relationship* and 


< ii > operations on the data that serve to express otueries 
and manipulation of the data 

Three standard models that are in use are ( i ) 
R e 1 a t i o n a 1 ( :f. i ) N e t w o r k ( i i i ) M i era r c h i c a 1 . 

The mathematical concept behind relational model is the 
set theoretic relation* This model was first proposed by 
CoddCC0DD703 in 1970* The theoritical basis for this model 
has been developed bw CoddCC0I>I3703 ♦ A relation consists of 
a set of tuples with each member bavins the same set of 
attributes* A relation with simple domain can be 
represented in a tabular form with no duplicate tuples* The 
attributes with atomic values are represented bw columns in 
a relation table* A tuple is a mapping from attribute names 
to value in their domains* 

The relational model eliminates the users' needs to 
k n o w t h & r e p r e s e n t a t i o n of their data. He n c e a n e f f i c i e n t 
storage system could be organized on this model which could 
b e t o t a 1 1 w t ran s p e r a n t t o t h e e n d u s e r s * 


A data base system is fulls relational if it supports 
CC0DD793 

< :L) the structural aspect of the relational model 
<2) the insert -update-delete rules 

(3) a data sublanguage as powerful as the relational algebra 
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A typical relation is shown in fj.g*l*l* 

The operations that are required to manipulate data in 
a relational model are des or ibed by Codd in h:i.s pioneering 
p a p e r CCajiW'/GJ * Operators to select tuples on a conditions' 
or to connect data across relations arc also available 
besides basic set theoritic operators like union* 
d :i. f f c r r e n c e * c r o s a p r o d u c t , 

The collection of relation schemes used to represent 
information is called relational data base scheme and the 
time varwind collection of tuples belonging to relations in 
a scheme? all of which can be accessed and updated? is 
called a relational data base* 

A true? relational data base should possess the 

f o 1 1 o w i n g pro p e r t i es I” C H A M B 763 

(3.) all information is represented by data values* No 
essential inf or mat ion :i.s contained in invisible connections 
among records* 

(2) at' the user interface ? no particular access path is 
p r e f e r r e d o v e r a n y other* 

<3) The user interface is independent of the means bw which 
data is Physically stored* 

The relational model has many advantages over the other 
two models * 


<1) simple 



( 2 ) d 3 t a :i. n p e n d e ri c e 

(3) symmetry :i. *e* no particular type of ctuery is a ns wo red 
faster than any other type* 

< 4 ) 8 1 r o n d t h e o r & t i e a 1 s u p p o r t 

The only disadvanta.de of relational model is that the 
implementation becomes cumbersome as :i.t tries to take the 
burden of access to the data away from the user* 


There are over a do sen of exist ins, relational systems 
implemented on main frames and minicomputers* A survey of 
s o m e o f t it e m h a s be e n in a d e i n I! K 1 M 7 9 3 + 
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1*3 ORGANIZATION OF THE THESIS 


Chanter 1 dives an introduction to DBMS and relational 


model * 
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Chapter 2 describes Physical data organizations with 
their merits and demerits in general. It briefly describes 
the or-eratinsJ system for the current, implementation and the 
f i le structure * 

The structure and organization of' the two file systems 
provided and the access methods interface is described in 
Chapter 3. 

In Chapter 4 a functional level description of the file 
management system is described. 

Chapter S analyses the need for buffers and the related 
data structures and describes the buffer allocation 
algorithm. The performance analysis of the particular file 
structures is carried out in Chapter 6. The concluding 
Chapter 7 summarizes work done and throws some light onto 


future work. 



CHAPTER 2 


PHYSICAL DATA ORGANIZATION 


2*1 FILE SYSTEMS 

The simplest usu to store a relation of a relational 
scheme :i.s as a file with records as tuples and fields 
representing the attributes of the tuples. For a small 
relation a heap orsfenJ. nation would be better in terms of 
space y but in a micro sustain heap organisation would have a 
poor response. 

A file is a collection of fixed or variable sise 
records. A record is the iosficsl unit of access and for 
relational model it is a tuple. In a file the collection of 
attributes that identify a record uni duel w is called a Key. 
A primary index on the file consists of collection of 
entries? each corresponding to a data record containing the 
value of the key for that record. A secondary index for a 
set of attributes that may participate in a key for the 
relation? is a relationship between the domain of these 
attributes and the set of records in the file. 
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The basic file a r.d animat ions are sequential v index 
s e « u e n t :i. a 1 ( ISA M ) > d :i. r & c t o r h a s h ed » i n d e x e d e t c * * 

These organisations are based on primary index* There 
are few other ordaniaat ions based on secondary indices such 
as inverted files." list and multilist structures* A si odd 
description of various files and their performance analysis 
is available in CUEID23* 

For a relational data base system the following' file 
o r d a n i zs t :i. o n s a r e t y p i e & 1 1 y p r o v i d e d * 

< i ) s e e u e n t i a .1 ( i i ) :i. n d e x s e e u e n t i a 1 { i :i. i ) d :L r e c t ( I I a s h e d ) 

( i v ) Inverted CKIM79:.1 * 

For an efficient implementation of the relational 
s y s t e m » t h e s t o r a s.$ e s t r u e t u r e p 1 a w s a n i m p o r t a n t r a 1 e 
because the response time to a auery is very much dependent 
on the underlying stora.de structure and the access 
mechanism* 

K e e p i n sf t h i s i n viewy t wo b a s i c s t r u c t u r e s •> n a m e 1 y i ( :i. ) 
Hashed <ii> ISAM have been supported bw the current 


implementation 



2 >2 CP/M -The 0,S, 


f o r c 1..1 r r © n t :i. m p 1 e m e n t a t i o n 


The file structures d © p e ri d on the environment in which 
they plaw their role* The basic: factor in this is the 
operating system* For the current implementation CP/M 
(Control Program for Microcomputers) has been chosen as the 
operating system* The choice was rather obvious because 
<i> it :i,s very popular and widely used* Most microcomputers 
in the country today support this* 

(i:i. ) It is very vesstile and can suit any kind of 

e n v :i. r o n m e n t * M a n y s t a n d a r d p a c k a g e s a r e a v a :l. 1 a b 1 e o n t h i s 

as ut:i. 1 ities « 

CP/M is s single user interactive operating system 
r u n n :i. n g o n 7 . - 8 0 a r I N T E L 8080/8 0 8 5 p r o c e s s o r b a s e d s y s t e m s ♦ 
Details of the o porting system are available in 

CCPMIQ3 y CCPMU03 r CCPMAG3 *■ 

2*3 DISKETTE' ORGANIZATION IN CP/M 

The widely used secondary device for a microcomputer is 
a diskette( floppy)* The diskettes come in various standard 
sizes and capacities* The organization on a floppy provide 
the basic design parameters for the file system* The 
following are the specifications of relevance to us 

> single sided single density y IBM compatible 

> 8-- inch diameter and soft sectored 
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> 26 sectors/ track of :t. 2 8 bytes each 

> 8 sectors per block 

> User available space of 24. 'I .K bates 

The Physical la adjacent sectors are not logically 
sequential, sectors in CP/H * This has been done with a view 
to support systems with lesser memory so that computations 
on the data from a previous sector are completed before the 
next sector is available below the head* A cluster 
allocation map is usually provided to find out the 
sequential sectors* A sector is the basic unit of transfer 
between the CPU and the diskette* 

2.4 CP/M FILI.-: STRUCTURE 


Ute as designers of a file ss astern would be more 
intersted in the file structure provided by the operating 
system. In CP/M version 2*0 and higher both sequential and 
random files are supported* An ASCII file is a sequence of 
ASCII characters? each line being marked by a carriage 
return and line feed* The end of the ASCII file is marked 
by Control-Z* The file size could be as high as the 
diskette capacity* A file is divided into logical segments 
of 16K bytes each called the “EXTENT" * Each extent is 
accessible through a control block called as File Control 
Block (FGB) ♦ The structure of FCB is given in Fig.2»i* 
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CP/M adopts a named file structure consisting of three 

parts i 

a 

<1) disk drive? 

(2) One to Exsiht non blank characters as name 

( 3 ) Z e r o t o T h r e e c h a r a e t e r s as e x t e n s i o n 


The PC 6s are stored in a logically distinct area in the 
diskette and a directory is maintained by CP/M * Maximum of 
64 such entries can be accomodated in a diskette described 
in the previous section* 
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DR i»> Drive 
F 1 ♦ ♦ F 8 !:!! > F i 1 © N a m e 
T1**T3 #»> Extension of file 
EX »«> Extent 

R 0 « ♦ R 2 « > U s e d f o r r a n d o in f i 1 e s 
Si y 82 ~> Reserved for system use 


RC ■"> Record Count 
DO* *D:I.S *-> Block nos* 
CR •"-> Current Record 


I" i .d * 2 * 1 ♦ F i 1 e Control B 1 o e k 

The BD08< Basic Disk Operating} System) module of CP/M 
provides with various system functions for various file 
operations such as creating! a files- operand a files- deleting 
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a files' wntind and reading a sector of the file etc > . 
T s fa 1 e 2 o 1 d e s c r i b e s t he 1 ;, 3,1 e h a ri ci 1 :i. i"s ,d & i,i u 8 f u n c 1 i o n © » 

A detailed list of functions and the w aw to access them 
is available in LCPMiGIt * 

lafale 2*1 

Function No* Function Name 


<i> 15 

Onen file 

(2) 16 

Close file 

(3) 17 

Search for first 

(4) 18 

S ear c h for n e x t 

(5) 19 

Delete file 

( 6 > 20 

R e a d s q o u e n t i a 1 

(7) 21 

W r :i. t © see u e n t i a 1 

(8) 22 

Make file 

(9) 23 

Rename file 

CI.0) 26 

Set DMA address 

<11 ) 27 

Get add r< ALLOC) 



CHAPTER 3 


FILE ORGANIZATIONS 


341 TWO FILE STRUCTURES 

Aiii mentioned in section 2.1 the two types of file 
organizations in the current implementation are ( i ) Hashed 
<ii) ISAM. In the later sections the structures of both the 
f i 1 e s are d :i. s c u s s e d . 

3.2 HASHED FILE STRUCTURE 

In a hashed file? the records of the file are 
distributed a mo nd a fixed number of buckets. A bucket could 
be a physical block or more. A bucket directory is 
maintained where each entry points to the block number for 
the bucket. The bucket to which a record should do is 
decided by its key value. The key value :i. s hashed by a hash 
function which decides uniauely the bucket number. 

The important parameters in a hash file desisSn are the 
follow ins! * 

<i> Hash function 

< i i > B u o k e t d :i. r e e t a r y s i z e 

( i i i ) H a n d 1 i n d b u c k e t o v e r f 1 o w s 
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(1 v) Obtaining the key value from the attributes 

Hash function in the current implementation is the 
'MOD' function* 


H < k * v * ) ™ ( k * v * ) M D D ( b u c k e t dir si:-:©) 
where MOD is the usual modulo operator* 

Decision on bucket directory has been left to the user* 
Four different sizes of bucket directory have been provided* 
Irrespective of the file size the user can chose any one of 
them* Table divert below describes it* 

TYPE Buck Dir Size 


< i ) SMALL ( ;i. ) 17 

<ii> MEDIUM (ii) 37 

( iii > LARGE (il:i. ) 67 

<iv> VERY LARGE <iv) 131 

Bucket overflow is handled by chainings* If the current 
block in the bucket is full a new block is aeeui red and 
chained to the current one through the ' NEXTM..K ' field* In 
case of a look up the whole set of blocks linked together 
are to be searched and the bucket directory points to the 


header block in the chain 
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To det an integer v a J. u e from the set or' key attributes 
the following procedure has been adopted as described in 
CKNU33 * 

< 1 ) Treat the value of attributes as seauence of bits* In 
case of more than one attribute concatenate the bit seauence 
in the same order they appear in the relation* 

(2) Make groups of 16 bits each 

< 3 ) (Set the integer value of each member 

< A ) Sum up the integers obtained in this way* 

The stucture of the Hashed file is shown in Fid* 3*1* 

The bucket directory has two pointers!- one pointing to 
the header block irt the chains’ the other points to the free 
available space in the bucket* The latter is used while 
writ! rid a record in the buckets' without doind through the 
chain of blocks to find the free space* 

The bucket directory and other control information like 
record sizes' total number of records in the files- number of 
records in a physical blocks' number of attributes in the 
relation and their sizes in bytes v key fields are stored in 
the first block of the file called as control block* This 
information is loaded onto main memory while opening the 


file 
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FREE PTE t~~ 


! BI...KNO ! OFFSET ! 
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(2) a field NEXTBLK pointing to the next losucsl block* 

< 3 ) a BITMAP indicating whether the corresponding record in 
the block is deleted* 

Fid* *3*2 describes the structure of each block in the 
buckets * 

The first block is different from the rest of the 
blocks and is used to store control information for the file 
a n d t h e b u e k e t dir © c t ora* 


Fid* 3*3 describes the first block which is different 
f r o iri t h e d a t a b 1 o c k s u s e d b w b u e k e t s ♦ 


D 

A 

T 

A 


A 

R 

EL 



B 1 T H A p ! N E X T 
I B !... K 



Fid* 3*2 l 


A data block in hash file 
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C 0 N T R 0 L BLOC K 

l H A 3 H <4 BYTES) 

! 

! R l£ C S I Z E<2 BYTES) I 

I \ 

! NO* OF RECORDS/BLOCK (2BYTE8) I 

! ! 

! NO * OF ATTRIBUTES <2 BYTES) ! 

f ! 

T~ — — f 

! SIZE OF EACH ATTRIBUTE I 

! ! 

i KEY ATTRIBUTE <S) 


I B U C K E T ! 
! I) I R E C T 0 R Y ! 
I <4 SECTORS) 


Fid* 3*3 ♦ Control block in hash file 

It maw be noted here that the real available space for 

the data in the data blocks is not IK bytes as about 34 
bytes are reserved for the two fields in the block shown in 
f i si « 3*2* 

The usual standard algorithms for look upj insertion? 
deletion? update of a record have been adopted* The set of 
routines for manipulation will be discussed in next chapter* 


3*3 ISAM FILE STRUCTURE 


The Indexed Sequential file attempts to overcome the 
problems encountered ;i.n ordinary sequential file but 
preserves all its benefits* It dives a better random access 
than the sequential file* In this the file is sorted In 
ascend i rid order on its key value* An index is built on the 
value of key fields* Instead of searching the whole data 
area first the index area is scanned to get the zone of 
availability of the record in the file* The presence of an 
o v e r f 1 o w p o i n t e r m a k e s i ri s e r t i o n e a s i e r * 

In this the following are the design considerations S 
< ;l. ) Primary data area 
<2) Overflow data area structure 
<3) Indexing level 

The overall file structure consists of an index? a 
primary data area and an overflow data area as shown in fid* 
♦ 4 ♦ 


Blocks in primary data area have the 

structure * u 

I 

( 1 ) data records with overflow pointers 

(2) a free pointer to free area in the block 
<3) a pointer to next logical block 

(4) a BITMAP indicating whether the corresponding 


fol lowing 


record is 


deleted 



The structure of a primary data area block is shown in 
fid* 3*5* 


Blocks in overflow data area have almost the same 
structure except that instead of an overflow no inter we have 
a next record no inter as deni o ted in fid, 3*6* 


Block level index in si has been adopted i*o* the index 
would have reference to the first record in the primary data 
block* Only one level of indexing would be enoudh because 
of maximum file size limitation in a diskette* 


The index block structure is shown in Fid* 3*7* The 
first block in the file stores other control information 
like record si zee total number of records* number of records 
per block* number of attributes and their respective sizes 
in bytes* key attributes in addition to the index entries as 
shown in fid *3* 8* 


A record pointer in this strucures has 


two parts < :i. ) 


a 


block no* 


(ii) offset in the block 



w 


+ 


+ 


I 


KEY 

VALUI- 


NG 


D A T A 


OVERFLOW 

POINTER 


INDEX BLOCK 


PRIMARY DATA AREA 


! D A T A ! NEXT ! 



OVERFLOW DATA AREA 


F:U.U 3*4 


i ISAM file structure 



B I T M A P 0"> VALID ! 

1*> DELETED ! 
















Fisl* 3 » B i Control Information 
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The same procedure as described ;i.n previous section has 
been adopted to get an integer value from the kew fields* 
f'he overflow blocks is re also :i. n the same file* No 
particular order ;i. s main ted in the overflow area* The 
records are not spanned over two different blocks* 

A certain method :i. s adopted to allocate an overflow 
block for the file as described below * 

Everw fifth block in the FCB is left as overflow 
blocks ♦ If then are ful 1 then a new block is & caul red at the 
current end of the file* This physical placement of 
overflow blocks would minimise the delay and improve the 
s y s t e m r e s p o n s e « 

Standard algorithms for retrivalv insertion!* deletions* 
updating of a record have been adopted* The details of the 
routines for this are discussed in the next chapter* 

3*4 ACCESS METHODS INTERFACE 


As a design criterion of relational data base one of 
the major issues is to provide the user an independent view 
of the data free from the underlying file strucure* The 
access methods interfaced AMI ) is the interface between the 
user and the file structures providing the structural 
transparency* This is a set of standard functions provided 
to the user to get access to the data stored in the data 



* 


The re 


3 r e ten 
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basic function J.n the AH I of the 
current implementation* They are described in the following 
1 i lies * 

( 1 ) UPENR ( CHNO 1 1- NAME ) 

This function opens the relation name specified An FNAhE end 
assigns a channel number to it which is used in other 
routines of AMI* For any legs. I operation on the data base 
the relation has to be opened* This sets up all the control 
st rue urea for the corresponding file* 

( 2 ) GET NEXT RE CO ED ( CHNO >■ ADRB ) 

This function reads the next tuple in the relation with 
channel no* CHNO and puts the record starting from ABES* 


( 3 ) PUTNEXTRECORD < CHNO » ADRB ) 

This function writes at the next position in the relation 
with channel no* CHNO taking the record from location 
s t a r t A n g w A t h A D R 8 * 

( 4 > GETRECORD < CHNO v ADRB v KEY UAL ) 

This function retrieves the record with key value KEYOAI... in 
the data base and if found puts it at the place pointed to 
by ADRB* 

< 5 ) INSERT RECORD ( CHNO j> ABE'S ) 

This inserts a tuple to the relation with channel no* CHNO 
taking the tuple pointed by ADRB * If calculates the key 


value from the record itself 



< 6 ) DELETE ft EC tiRIi ( CHNO t KEYVAL ) 


Hi© record with key value KE Y LAI... is deleted from the 
relation with channel no* CHNO by setting the corresponding 
bit in the BITMAP* 


< 7 ) UPDATERECORD < CHNO t APRS > 

T h :i. is function updates the tuple with kew value as in the 
record pointed by ADRS and only updates the non key fields 
of the record* 


<8) MOD IFYREiCOftDC CHNO i-ADftS) This is same as update function 
accent that it modifies the kew fields too* 


< 9 ) CLOSER < CHNO) This function closes the relation with 
channel no CHNO for any further operation on it* 

< 10) GREATER ( FNAME ? FLAG y ATRCOUNT y ADRS1 y ADRS 2 ) This function 
creates a new relation with name as in FNAME and the file 
structure as specified tow I" LAG* If flsst^O then hashed 
structure is assumed? if flad™! then ISAM file structure is 
assumed ? if f 1 ad -2 then one of the two structures is chosen. 
The information about the no* of attributes is taken from 


ADRS1 and about key attributes from ADRS2* ATRCOUNT 
for total number of attributes in the relation* 


stands 


CHAPTER 4 


BASIC FUNCTIONAL LEVELS 


4*1 FOUR BASIC LEVELS 

In the implementation of the file structures discussed 
in previous chapter four distinct functional levels have 
been identified* In this chapter the four different levels 
w o u 1 d b a d i s cus s e d ♦ 

4*2 SECTOR LEVEL 


This is 

the foundation level 

fo r 

the 

o t h e r f u n c t :i. o ri a 1 

levels that 

are built on top of 

i 1 1 

In 

this level the set 

of routines 

d 1 r e c 1 1 y i n t e r a c t 

w :i. th 

the 

B»08< Basic Disk 


Operating . System) module* As mentioned earlier HDDS 
provides many basic file handling! routines which were 
required for building! up the rest of the levels* 
Essentially this levels serves this purpose* This set of 
routines provide basic diskette management routines* These 
routines are not available to the users as these functions 
would not carry much sense for him in terms of his records* 
The current implementation on CD OS does not support random 
file handling! facility directly* A pseudo-random facility 


has been provided on basic 


s e is u e n t :i. a 1 f i 1 e s * 


In these 


31 


routines the uni t of transfer is 3 sector <128 bytes) a 

Table 4.2* :l. lists out basic routines in this level. 

Table 4.2.1 


NAME 


FUNCTION 


< i ) CP MO PEN 

<:i.:i. ) CPMMAKE 

< i :i. i ) CPMSEAF 

< iv) CP M SET DMA 

<v) CPMREADSEQ 

< vi ) CPMWRITESEQ 

(vii) RE ADR AND 

< viii ) WRITER AND 

< i. X ) CPMDELETE 
<X) CPMCL.Q8E 


Opens the file for any 
meaningful operation . 

Crea t e s a new f i 1 e * 

S e a T'ches f o r a f i 1 e 
in directors. 

Sets the DMA address to 3 
a s p e e :i. f i e d v & 1 u e . 

R e a d s 1 1) e n e ?•; t s e a ♦ s e c t a r 
at current DMA address. 
Writes the newt see. 
sector from current DMA 
address » 

Reads the given logical 
sector ait specified adrs « 
Writes the logical sector 
f r a m s p e c 1 f i e d a d r s . 

D e 1 e t e s a f i 1 e f r a m d i r . 
Closes the file from 
further operation on it. 



" * 


I’*) 

t.J Am 


4*3 BLOCK LEVEL 


For anw meaningful of® rati an on a data baa® a sector 
wise} transfer would cost a lot in terms of the secondary 
storage accesses* In order to minimise the secondary 
stora.de accesses a level with a hid her unit is called for* 
This level has this purpose to serve* A block (IK bytes) has 
been chosen as the unit of transfer for this set of 
ft need ores as the file is stored in terms of blocks and the 
size is reasonable enough to reduce the secondary storage 
accesses* It provides almost all routines as in the 
previous level* Table 4*3*1 summarizes the set of routines 
provided bw this level* This level, is also transparent to 
t h e o u t s i d e u s e r s * 


Table 4*3.1 


NAME 


FUNCTION 


(i) BO PEN 
< i 1 ) BREAD BEO 

< .1, i i ) BREA BRAND 


Block level open procedure 

Reads the next sea* block 

in FOB into buffer* 

Reads the specified block 

( logical ) in FCB into touf* 

W r i t e s i n t o n e x t s e a * 

block in FCB from buffer* 


<iv> BWRITE8EG 


<v) B WRITER AND 


<vi) B INSERT 


(vii ) B DELETE 


< v :i. :f. i ) B CLOSE 

4*4 RECORD LEVEL 


W i' x t e s :i r» t a t h © s p © a a f :i. © d 
b I d c k i n F C B f r o m b u f f e r * 
Insor fcs s> block at si von 
Fla co in FOB > 

B © 1 © t © s t h © a p © e :i. f :i. e d 
block from TCB. 

Block level v i.o Be routine* 


This is the loss! cal record level* A record is the 
logical unit of access for the user* The does not bother 
about the system dependent features like sector size or 
block size* His view of the data base is in terms of 

logical records* This level provides that interface to the 
data base* This set contains all basic routines to 

manipulate a data base like fetch ? insert? deletes- update? 
modify* The manipulation procedures beind structure 
dependent this set contains two sets of procedures one for 
each file organization described in previous chapter? Table 
4*4*1 describes the procedures and the assoc i ted function. 
For this level the unit of transaction as stated earlier is 
a record. This level makes extensive use of the previous 
two levels. An user with the knows I dd© of the 


under Iwlndfi le structures can use these routines* 


NAME 


FUNCTION 
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Table 4 » 4 . 1. 


<:i. ) -"-OPEN 

<ii> READ BE Q 

(iii) - WR1TESEQ 

( :i. v ) FETCH 

<v> - INSERT 

( v :i. ) — DELETE 

( v i i ) -UPDATE 

<v:i.:i.:i. ) MODIFY 

<:i.x) CLOSE 

could be either HASH or 

4,5 RELATIONAL LEVEL 


Opens the file and sets up 
r e 1 a t e d d a t a s t r u e t u res. 
Reads the next Iodic; a I rsc ♦ 
Writes into the next 
logical record 
Fetches the record with 
specified Lea value. 

Inserts the given record 
at proper place. 

Deletes the record with 
specified key value. 

Updates the non -key fields 
of the specified record. 
Modifies the whole record. 
Closes the file and the 
associted data structure, 

I BAM > 


This is the top Most level and uses the 
interface, The way it differs from the AMI is 


access method 
in terms of its 


ff b t"l'.{4, &*t <$ him die secondary index 


access. In this level the 
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unit of access is a logical record* 
not been implemented and the details 
be decided when files based on 
pr ovided * 


Ho wove v v this level, has 
of the routines are to 
s e e o n d a r w i n d i c e s are 


i 



CHAPTER 5 


BUFFER MANAGEMENT 


3*1 NEED FOR BUFFERS 


A microcomputer haw typically » 64K bwies of main 
in e m o r w « A s s t a t e d e a r 1 .i. e r v f h e u n .i t o f I; r a n a f e r b e t w e e n 
diskette and the main memory is a sector of 128 bytes* An 
access to the diskette is the costliest operation time wise* 
To overcome these limitations a buffer scheme is called for* 
The scheme would load a block of data at a time rather than 
a sector so that the seek is minimised* As the main memory 
cannot provide an unlimited number of buffers » a fixed 
number of buffers are kept in a central pool and shared by 
all files in operation* 


In the current implementation there is a pool of 32 IK 
byte buffers* The size of the buffer is chosen as IK bytes 
to match the files in CP/M that are stored in terms of 
blocks* A buffer allocation algorithm must decide on a 
strategy for allocation of buffers to the files* I he data 
structures used for this and the buffer allocation algorithm 


is given below 
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5*2 DATA STRUCTURES 

J.n order to utilise the buffers efficiently a good 
book-keeping of the status of each buffer over the auerw 
operation is required* For a typical euerw maxi mum of .1.6 
differ rent re nations to be involved lias been assumed* To 
maintain the status of each buffer different tables have 
been constructed* This section describes the tables in 
detail 1 ♦ 

There are msinlw three tables in operation n a in el y (i) 
Table of Channels (TOC) (iii.) Table of Files (TOP) <ii:i.) 
Table of Buffers <TOB> 

< i ) Table Of Channels (TOC) 

The structure of the table is shown in Fisa* 5.1* Any 
file at the time of getting opened seeks for an entry in 
this table. This is indexed by CHNO which is the channel 
number assigned to the file succesfullw opened. A channel 
number unieuelw decides the file but not vice-versa. This 
table keeps lonformation about the current logical disk 
block in use and the offset in the block through two fields 
called 'CURDISKBLK' and 'OFFSET' respectively. FNQ and BNO 
are two more fields in the table and are index numbers into 
the corresponding entry in the other two tables to be 

I 

discussed. The last field in the table specifies whether 
the channel number is active or closed. The table has a 
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S1Z ® of 16 entries* A function ALL (DC CHANNEL allocates 
entries from this table at the time of file opening* 


FNQ 


CIJRDISKBLK 


OFFSET 


BNO 


t — — i 

CLOSED 


Fig* 5*1. ? Table Of Channels 


< i i ) Table Of Files (TOP) 


Once a file is opened the relevant information about it 
is stored in this table as shown in F:i.g* 5*2* This table 

is indexed tow FNO which appears as fields in other two 

tables* The FCIKFile Control Block) of the file is stored 

in this table in the FCB field* CHNO is the channel number 

that has been assigned to this file is the second field* 
The number of buffers the relation is in possession from the 
central pool is indicated by BUFOOIJNT field* This is used 
in the buffer allocation algorithm* The FTYPE field stores 
information regarding the type of file whether hashed or 
ISAM* As before the last field is the indicatin' for the 



are 16 entries in this table* A 


e 1 o s e d r © 1 a t :i. o n s * T h e r © 

function ALLGCFNO allocates entries from this table at the 


time of 

open ins} the 

relation* 


+- 

J 

1 

I 1 

« 

i FOB 

1 

1 

! CHNO 
\ 

! BUFCOtJNT ! FTYPE 

1 1 

i 

CLOSED : 

1 

i 

i 

» 

! 

t 

\ 

i 

i 

i 

i 

i 

! 

1 

i 

1 

I 

\ 

i 

1 

l 

1 

1 

1 

\ 

l 

1 

t 

1 I 

1 1 

1 | 1 

1 1 
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1 ! 

i i ! 

! ! 

! ! 

! I 

l 

l 

1 

1 

1 

1 

f 

1 

1 

1 

1 

1 

1 

1 

1 


ii lt)l )m mi mi mi **■* *m mi . |i wi •!*■ mi Im »»•* ***» H" »**• "|* <»»• "n »««»*•»» **>■ •«** nn n" u» »« 


Fla* 5*2 l Table Of Files 


<:i.ii> Table Of Buffers (TOB) 


The structure of this table is shown in Fisf* 5*«S* 
this table Keeps track of all the buffers* There are 32 
entries in this table corresponding to each buffer of the 
buffer pool* BUFNO is the field indicating which buffer of 
the pool is allocated to this entry. This table is indexed 
bw BNO which has already appeared in the other two tables* 
This table stores the corresponding CHNO and FNO of the 
relation in CHH 0 and FNO fields* DI8KBH0 indicates the 
logical block number in the FOB* The UPDATED fl«s if 

the 1 buffer has been written into after it was Jbast read I rum 
the diskette* The last field In the table indicates the 
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buffers that are free and available for use* 

A function ALLOCBUF :i.s aveilble which allocates buffers 
to a channel number as and when it asks for one? according} 
to the buffer allocation algorithm to be discussed in next 
section* 


i 




I BUFNO 

!■ — 

I 

FNO 

1 

BIS KB NO ! UPDATED 

1 

J 

I 

1 

1 

1 

1 

CHNO 

FINISHED 1 

1 

1 

> 

i 

1 

1 

1 t 

1 1 

1 1 

•i : - 


1 

t 

1 

1 

1 

^ MM ♦♦♦■ M'* lj 

1 1 

1 1 

1 l 

1 1 

1 1 

1 1 

I f 

I 1 

1 f 


Fig}* S«3 * Table Of Buffers 


The three tables are maintained bw 
levels of the last chapter* The sector 
do not need buffers id no re these tables 


all the top three 
1 eve 1 f i..i n c t i o n s t h a t 


BUFFER ALLOC AT I ON ALGOR 1 THM 


As and when a relation recu.il res a buffer it asks for 
one from the central pool throssh the ALLOCBUF function* 
Be pendind on availability of the buffers and the constraints 
put buffers are allotted from the pool* The st rated w in 
case of non-aval lability of buffers is to penalize the file 



with maximum number of buffers by removing one buffer from 


•j. t * A minimum of two buffers are allocated to any opened 
file with out any constraint » Beyond two is decided by the 
algorithm divert below* 


INPUT* A valid channel no* 

OUTPUT* An index to the pool of buffers or zero indicating 
that a buffer cannot be allotted to :i.t* Hence it has to use 
its own buffer acquired before* 


l # If TOP * BUFC0UNT>~2 then GOTO step 3 else CONTINUE* Either 
the relation is do:i.nd to be. opened (no buffers with it) or 
it has one buffer and now requires another one* In either 
case :i. t has to be allotted one buffer as the CHANNEL NO* 
be And valid implies that there are still less than 1 & 
relations in operation* 

Search for a buffer index that does not occur in TUB -BUENO* 
IF found then allocate it to the channel and STOP ELSE GOTO 
step 2* 


2* There is at least one relation with more than two 
buffers with it* Search for that entry in the TOP which has 
maximum number of buffers with the help of BUFCOUNT. It is 
the one to be Penalised* Get the corresponding channel no 
of the entry found obovi* In TOP for the corresponding 
channel no* find the buffer that would involve least work 
iiBi dot a buffer which has not been updated in that 
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channel no* » if not found then take the first entry 
corresponding to the channel no* write it hack Into the 
disk and assign this buffer to the channel needinss one and 
STOP* If a buffer without update is available then allocate 
t h e c o rr e a p o n d i n ss BUFN 0 a n d STOP* 


3* If only free buffers available then allocs 
0* Search for an index that does not occur 
If found allocate the same and STOP 
else return a aero Indicating that a buffer is 
and STOP * 


te else return 
in TOB ♦ BI.JFNQ * 

not available 



CHAPTER 6 


P E R F OR NANCE A N A I .. Y 8 1 8 


6 * .1. P E R F 0 R M A N C E M E A 8 U R E S 


•l.n this chapter y performance analysis of the two file 
structures that have been designed is carried out* To 

compare the two file structures the following Parameters 
have been chosen * 

< i ) Average Effective record s litre l R 

Time to fetch a record given the key value JTf 

(iii) lime to get the next record within the file i Tn 

(iv) Time to insert a record i Ti 
<v) Time to update a record I Tu 

Deleting a record is assumed to be Just upda-tinat setting) 
the corresponding delete bit in BITMAP* hence not considered 
a® a measure* 


The assumptions made and the symbols used in the 
derivation are given below l 
< 1 ) R e c o r d s i z e ■- R b u t & s 

(2) Block size" B bytes 

(3) Unspanned records :i.*e* records are not split over two 
blocks 

(4) Total no* of records in the file" n records 



44 


(1,'j) Seek time™ sc i»et? time required to position the head 


(S) Seek 

time™ 

in v i slht 

track 


( o ) Rotation 3.1. latency™ r 'Ue, time to locate the sector 
(?) A block pointer size™ P bytes 
<8> A record pointer' P'™PiQf fset size 

(9) Instantaneous transfer time -=t 

( 10) B 1 o c k t r a n s f e r t :i. in e ™ B 1 1 ™ B / 1 

Cl. 1) Bulk transfer rate^t' (takes into account sector dap 
blocking y seek ) 

< 12) T h e t i in e t o o e e n a f i 1 e » c (a in p u t a t i o n t i m e t o m a in t a i n 
the tables > and setting up of the control data structures 
are negligible to be ignored* 

CL 3) Total nth of attributes in a record™ a 

Cl. 4) Avera.de size of each attriburte™ V bytes 

(IS) The index or the bucket directory is located in the 


main memory 

(16) Bucket directory size (in hashed files)™ d 

CL 7 ) C o m p u t a t i o n t i me™ c 

Cl. 8) Expected Overflow chain lendth™ Lc 

CI.9) Probability of the record beind in current block™ Pc 


6*2 AVERAGE EFFECTIVE RECORD SIZE 

(a) HASHED 
R«R1+R2+R3 

where R l 88 Actual size of attributes™ aV bytes 
R2» Control information per record « <B/n) 


R3- Wasted space overhead per record 
« < B •••• F :i. o o r ( B / a V ) sfc a V ) / < F 1 o o r < B / a V > ) 

(b) ISAM 
R* R1+R2+R3+R4 

where Rl~ actual size of record- aU bates 
R2» Record pointer- P' 

R3»: Space occupied by Index blocks per record- UVn) 
R4«s Wasta.de over head per record 
a ( b-F : i. oo r < B/ < a V+P ' ) >*<aV+P' ) )/< Floor < B/<aM+P' ) ) ) 


6*3 TIME TO FETCH A RECORD 


<a> HASHED 
Tfw T1+T2+T3 

where T:l- Computation time to Set the pointer to block -c 
T2 |JK Read in «J Block<s> in the bucket 

time to read average no* of blocks in a bucket /2 
rn ( n/d ) * ( a+r tbit ) / < 2* ( F loo r ( B/aV > ) ) 

T 3 «» Search time in the blocks of the bucket 
ini CKn'Lofiin') 


(b) ISAM 
Tf* T1+T2+T3 

w h . r . Tl- Time to set the block, no. from index- c 
T2- Time to need the block(e) in the Chain 
- Expected Overflow chain lenath*«me to read a block 

im Loft ( sirtbtt ) 
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T3~' ; T :i. iti o to search in blocks 
« 0 < n " Loisln ' ) 

6.4 TIME TO GET NEXT RECORD 

(a) HASHED 

Trv-:: Tf if the key value of the next record is known 
else 

* reading next record in buffer 
P#cH- ( :l. ™p ) % ( st rtbtt ) 

( b ) ISAM 


.L'n this ease it would depend on where the last record 
was ♦ There are four possibilities * 

( ,i ) (he last record and next record are in primary area* 


Tn* p#©+ ( 1-p ) X! < st rtbtt ) 

( i i > The last record was in primary and next record is in 
overflow area 
Tn«* ct ( st rtbt t ) 

(i :i. :i, ) The last record was :i.n overflow area and next record 


is in primers area* 

Tn™ ct < st rtbtt) 

(iv) The last record and the next record are in overflow 


area 


Tn™ p#c+ < 1 ™p ) * ( a+r+btt ) 



6 ♦ 3 TIME TO INSERT A RECORD 
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(a) HASHED 
Ti* T1+T2+T3 

where T:l.“ locate the free pointer from bucket directors 
Cl 

T2 :s! read the block into buffer® (strtbtt) 

T3 :::i Write into buffer 
.... r o 

<b> ISAM 

T:i. :si Time To locate the placet write into buffer ~ 1 f to 


6*6 TIME TO UPDATE A RECORD 

In case only the non-kew fields are updated l 
(a) HASHED 

Tu™L,ocate the record! write into buffer 


Tfte 


( b ) ISAM 

Tu^Locat® the reeordtwrit® into the buffer 


» Tf+e 

In case the key fields being updated/ 
would be divert by 


the time i ri both cases 


T »Ti+Tu 


In 

model * 


the above 
. For a 


deri vat ions 


floppy based 


we have assumed a very simple 
systems mans other parameters 
loading timer Head settling 


come into picture like 
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time? Motor start time etc* in hardware parameters arid 
Operating swstem characteristics in handling the file and 
pattern of access bs the user program* But incorporating 
all these would make the model vers complicated* An 
interesting experimental model taking into account all such 
parameters have been described in [P EX HU 83! * 


CHAPTER 7 


CONCLUSION 


7*1 SUMMARY 


In this thesis the major dool has been to implement an 
efficient stars fife structure for RDB/M ~a relational data 
base for microcomputers with CP/M as the operating system * 

Two types of file organisations have been provided viz* 

( :i. ) HAS H E D ( i :i. ) ISAM* The s t r u c t u r e s o f t h e s e t w o f i 1 © ® f o r 
the particular implementation has been discussed* In 
implementing the file system four basic functional levels 
have been identified* The basic two levels are common to 
any file structure that in future would be included. 

To up jsf rad© system performance a buffer management 
scheme has been adopted* The details of the data s> true bur es 
used and the buffer allocation algorithm is discussed* 

A Performance analysis of both the file structures has 
been carried out analytically* Few Performance measures 
have been chosen and the same for both has been computed* 



The implementation has been carried out on 


CRD MEM CC) 


<3 y S T E M T H Fi E E v a n :i. t ts C D 0 S o p e r a tin d s w s t e m * I r i f o r m a t i o n 
red a rdirisf CDOS-CP/M compatibility is available in CCD0SUM3 » 
The 1 snduado used for the implementation is a special 
version of PASCAL called PA8CAL/MTCMTPA83* 

7,2 SCOPE FOR FUTURE WORK 


This has been an initial attempt to implement a 
relational data base on microcomputers* As a first step 
towards this? four important features of a relational data 
base have been chosen. There are lot more t hinds to be done 
before it could become commercially usable. 

As an initial extension modifications in the present 
implementation be made to incorporate WINCHESTER drives as 
< i; e o o n d a r a s i or a a e d o v i c e . 

In the storage structure > protection of the data base 
from unauthorized use is to be provided. A standard model 
to extend the existing strueure for concurrency control can 
be thought of. Facilities for selective access control and 
Providing User views and snap shots are do.i.nd to be ..>omc 
interesting features for implementation. Some other typical 
secondary key based file structures should be provided. 
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Onltf an analytical. Performance analysis has been 
'led out » • An experimental mode if. to measure these Tor a 

vicular system as susSdested :i.nl"PEGH 0 o."l coul.d bo earr ied 
t o hi & a s 1..1 r e d :i. t f e r e n t p o r f o r m a rn c & m e a s u rest 
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